Energizing Change: Electric Vehicle Rise in Switzerland

Author

Pierre Vernet, Alexis Jeanneret, Hugo Troendle, Urs Hurni

Published

November 19, 2023

1 Introduction

1.1 Background and Motivation

In an era marked by unprecedented environmental challenges, our world is at a critical juncture where sustainable practices are no longer an option but a necessity. Climate change, driven by anthropogenic activities, poses an imminent threat to the delicate ecological balance. Recognizing the urgency of this global issue, we are compelled to investigate the specific context of Switzerland, the country we live in, and which has a lot of financial capacities to change rapidly.

The automotive sector plays a pivotal role in shaping the a sustainable world, and our focus on electric vehicle adoption in Switzerland reflects a strategic choice to comprehend the intricate interplay of factors driving this transition.

The motivations underlying our research stem from a multifaceted perspective. Firstly, the automotive industry is undergoing a transformative shift globally, with electric vehicles emerging as a key solution to reduce carbon emissions. By narrowing our focus to Switzerland, we aim to provide nuanced insights into the factors influencing EV adoption, offering a unique perspective on the socio-economic, environmentally stable and financially comfortable Swiss context. The adoption of electric vehicles provides an insight into people’s inclination to make efforts towards sustainability. This emphasizes the crucial role that electric vehicle adoption plays in evaluating society’s commitment to sustainable practices.

Secondly, our commitment to environmental sustainability extends beyond a generic interest. The urgency of addressing climate change has never been more palpable, and our research seeks to contribute to the discussion on sustainable mobility.

Moreover, we believe the future is encapsulated in our dedication to understanding the dynamics of environmentally conscious consumer behavior. By unraveling the complexities of EV adoption in Switzerland over the years (2005-2022), we aim to offer valuable insights that could guide policymakers, businesses, and consumers towards more informed and sustainable choices.

We believe that by comprehensively analyzing the impact of external factors on the adoption of electric vehicles in Switzerland, we can contribute meaningfully to the ongoing global efforts towards a more sustainable and resilient future. Through this project, we aspire to inspire positive change, foster innovation, and advocate for a paradigm shift towards eco-friendly practices in the automotive sector and beyond.

1.3 Reseach Questions

    1. Given fluctuations in oil prices, demographic shifts, and major policy, which of these factors have a direct causal impact on the shifts in the adoption of electric vehicles in Switzerland?
    1. Based on past electric vehicle adoption trends in Switzerland, can we forecast future adoption rates and pinpoint times of significant increases or decreases correlated with major events or policy changes?
    1. In comparing regions in Switzerland, which areas show higher or lower adoption of electric vehicles, and how does this regional adoption align or vary with external factors like oil price changes, political opinions, and demographic shifts?
    1. How has the growth of electric vehicles evolved in comparison to other countries such as France, and what factors might account for the differences in their evolution ?
    1. To what extent does the evolution in the availability of charging stations exert an influence on the adoption of electric vehicles in Switzerland?

2 Data

2.1 Raw Datasets

Code
# Reading the data
vehicle_data_2005_2008 <- read.csv("../data/road_vehicle_CH_2005-2008.csv")
vehicle_data_2009_2022 <- read.csv("../data/road_vehicle_CH_2009-2022.csv")
google_trends_data1 <- read.csv("../data/googletrends_auto-elettrica_2005-2022.csv")
google_trends_data2 <- read.csv("../data/googletrends_elektro-auto_2005-2022.csv")
google_trends_data3 <- read.csv("../data/googletrends_elektrofahrzeug_2005-2022.csv")
google_trends_data4 <- read.csv("../data/googletrends_elektromobil_2005-2022.csv")
google_trends_data5 <- read.csv("../data/googletrends_eletric-car_2005-2022.csv")
google_trends_data6 <- read.csv("../data/googletrends_EV_2005-2022.csv")
google_trends_data7 <- read.csv("../data/googletrends_vehicule-electrique_2005-2022.csv")
google_trends_data8 <- read.csv("../data/googletrends_voiture-electrique_2005-2022.csv")
oil_prices_data <- read.csv("../data/BrentOilPrices.csv")
demographic_data <- read.csv("../data/demographic.csv")
charging_station <- read.csv("../data/charging_station.csv")
france_v <- read_excel("../data/parc_vp_france_2022.xlsx",sheet = 2)

For this phase of the project, our approach involves presenting diverse datasets in a table form. To achieve this, we used the kable function from the kableExtra package, resulting in a table that displays each variable along with its corresponding definition.

2.1.1 Swiss Vehicle Registration Dataset

This data set explains the new registrations of road vehicles by vehicle group, canton, vehicle type, fuel, month and year in Switzerland. This data set is essential to evaluate the EVs’ evolution within our chosen location.

Code
# Create a tibble with variable descriptions
variable_table_vehicle <- tibble(
  Variables = c("Canton", "Vehicle group / type", "Fuel", "Month", "2009-2022"),
  Meaning = c(
    "The region in Switzerland.",
    "Type or group of the vehicle.",
    "Type of fuel used by the vehicle.",
    "The month of the data.",
    "Number of vehicles for each respective year. Each with it's own column"
  )
)

# Display the table using kableExtra
variable_table_vehicle %>%
  kbl() %>%
  kable_styling(position = "center")
Variables Meaning
Canton The region in Switzerland.
Vehicle group / type Type or group of the vehicle.
Fuel Type of fuel used by the vehicle.
Month The month of the data.
2009-2022 Number of vehicles for each respective year. Each with it's own column
  • Source https://www.bfs.admin.ch/bfs/en/home/statistics/catalogues-databases/data.assetdetail.24105892.html

2.1.2 Oil price Dataset

This data set explains the evolution of the price of brent oil over time. From this data set, we will be able to obtain precious information to compute our linear regressions concerning oil price and EVs adoptions.

Code
# Create a tibble with variable descriptions
variable_table <- tibble(
  Variables = c("Date", "Price"),
  Meaning = c(
    "The date when the price was recorded. In a day-month-year format.",
    "The price of Brent Oil on the given date."
  )
)

# Display the table using kableExtra
variable_table %>%
  kbl() %>%
  kable_styling(position = "center")
Variables Meaning
Date The date when the price was recorded. In a day-month-year format.
Price The price of Brent Oil on the given date.
  • source https://www.kaggle.com/datasets/mabusalah/brent-oil-prices

2.1.3 Swiss Demographics Dataset

While this dataset contains a substantial amount of information, only a fraction of it will be pertinent to our research. It explains the evolution of the Swiss population over time through variables given below.

Code
# Create a tibble with variable descriptions
variable_table <- tibble(
  Variables = c(
    "Year", "Citizenship (category)", "Sex", "Age", "Population on 1 January", 
    "Live birth", "Death", "Natural change", 
    "Immigration incl. change of population type", "Emigration", 
    "Net migration incl. change of population type", 
    "Change of population type", "Acquisition of Swiss citizenship", 
    "Gender change in the civil register (entry)", 
    "Gender change in the civil register (exit)", "Statistical adjustment", 
    "Population on 31 December", "Population change"
  ),
  Meaning = c(
    "The year of the demographic data.",
    "The categorization of the citizenship status.",
    "Gender category.",
    "Age category.",
    "Population count at the beginning of the year.",
    "Number of births in the year.",
    "Number of deaths in the year.",
    "Change in the population due to births and deaths.",
    "Number of immigrants, including change of population type.",
    "Number of emigrations.",
    "Net migration count, including change of population type.",
    "Change in the categorization of the population.",
    "Number of individuals who acquired Swiss citizenship.",
    "Number of gender changes registered (entry).",
    "Number of gender changes registered (exit).",
    "Adjustments made to the data for accuracy.",
    "Population count at the end of the year.",
    "Change in population over the year."
  )
)

# Display the table using kableExtra
variable_table %>%
  kbl() %>%
  kable_styling(position = "center")
Variables Meaning
Year The year of the demographic data.
Citizenship (category) The categorization of the citizenship status.
Sex Gender category.
Age Age category.
Population on 1 January Population count at the beginning of the year.
Live birth Number of births in the year.
Death Number of deaths in the year.
Natural change Change in the population due to births and deaths.
Immigration incl. change of population type Number of immigrants, including change of population type.
Emigration Number of emigrations.
Net migration incl. change of population type Net migration count, including change of population type.
Change of population type Change in the categorization of the population.
Acquisition of Swiss citizenship Number of individuals who acquired Swiss citizenship.
Gender change in the civil register (entry) Number of gender changes registered (entry).
Gender change in the civil register (exit) Number of gender changes registered (exit).
Statistical adjustment Adjustments made to the data for accuracy.
Population on 31 December Population count at the end of the year.
Population change Change in population over the year.
  • source https://www.bfs.admin.ch/bfs/en/home/statistics/catalogues-databases/data.assetdetail.26645079.html

2.1.4 Google trend data

This Data set shows the evolution of Google trends on 8 different terms. We chose these terms in all 3 different main Swiss national languages (French, Italian and German) to make sure we get the most accurate data possible. Those variables will help us highlight potential tendencies/trends.

Code
# Create a tibble with variable descriptions
variable_table_google_trend <- tibble(
  Variables = c("Month", 
                "Search Term: Voiture électrique", 
                "Search Term: Véhicule électrique", 
                "Search Term: EV", 
                "Search Term: Electric Car", 
                "Search Term: Elektromobil", 
                "Search Term: Elektrofahrzeug", 
                "Search Term: Elektro Auto", 
                "Search Term: Auto Elettrica"),
  Meaning = c(
    "The month of the data",
    "Google Trends data for 'Voiture électrique: (Suisse)'",
    "Google Trends data for 'Véhicule électrique: (Suisse)'",
    "Google Trends data for 'EV: (Suisse)'",
    "Google Trends data for 'Electric car: (Suisse)'",
    "Google Trends data for 'Elektromobil: (Suisse)'",
    "Google Trends data for 'Elektrofahrzeug: (Suisse)'",
    "Google Trends data for 'Elektro Auto: (Suisse)'",
    "Google Trends data for 'Auto Elettrica: (Suisse)'"
  )
)
# Display the table using kableExtra
variable_table_google_trend %>%
  kbl() %>%
  kable_styling(position = "center")
Variables Meaning
Month The month of the data
Search Term: Voiture électrique Google Trends data for 'Voiture électrique: (Suisse)'
Search Term: Véhicule électrique Google Trends data for 'Véhicule électrique: (Suisse)'
Search Term: EV Google Trends data for 'EV: (Suisse)'
Search Term: Electric Car Google Trends data for 'Electric car: (Suisse)'
Search Term: Elektromobil Google Trends data for 'Elektromobil: (Suisse)'
Search Term: Elektrofahrzeug Google Trends data for 'Elektrofahrzeug: (Suisse)'
Search Term: Elektro Auto Google Trends data for 'Elektro Auto: (Suisse)'
Search Term: Auto Elettrica Google Trends data for 'Auto Elettrica: (Suisse)'
  • source https://trends.google.fr/trends/explore?q=voiture%20%C3%A9lectrique&date=now%201-d&geo=CH&hl=fr
    https://trends.google.fr/trends/explore?date=now%201-d&geo=CH&q=v%C3%A9hicule%20%C3%A9lectrique&hl=fr
    https://trends.google.fr/trends/explore?date=now%201-d&geo=CH&q=EV&hl=fr
    https://trends.google.fr/trends/explore?date=now%201-d&geo=CH&q=Electric%20car&hl=fr
    https://trends.google.fr/trends/explore?date=now%201-d&geo=CH&q=Elektromobil&hl=fr
    https://trends.google.fr/trends/explore?date=now%201-d&geo=CH&q=Elektrofahrzeug&hl=fr
    https://trends.google.fr/trends/explore?date=now%201-d&geo=CH&q=Elektro%20Auto&hl=fr https://trends.google.fr/trends/explore?date=now%201-d&geo=CH&q=Auto%20elettrica&hl=fr

2.1.5 French vehicle registration data

We decided to add this data set to help us compare with another country. Indeed, France is an adjacent country which might imply potential similarities in the outcome. This data set is made of variables such as the Date of the French vehicle registration or whether it is from a professional user or not. The information we are interested in concern the number of registered vehicles in the country by propulsion method over time.

Code
# Create a tibble with variable descriptions
variable_table <- tibble(
  Variables = c("Date", "Ensemble des voitures particulieres", "Professionnel", "Particulier", "Crit'Air"),
  Meaning = c(
    "The date, in year format, in which the observations are registered",
    "The sum of vehicle in every groups",
    "Category 'Professional' of vehicle",
    "Category 'Passenger Car' of vehicle",
    "Categorizes vehicles by emissions with from best to worst classes"
  )
)

# Display the table using kableExtra
variable_table %>%
  kbl() %>%
  kable_styling(position = "center")
Variables Meaning
Date The date, in year format, in which the observations are registered
Ensemble des voitures particulieres The sum of vehicle in every groups
Professionnel Category 'Professional' of vehicle
Particulier Category 'Passenger Car' of vehicle
Crit'Air Categorizes vehicles by emissions with from best to worst classes
  • source https://www.statistiques.developpement-durable.gouv.fr/donnees-sur-le-parc-de-vehicules-en-circulation-au-1er-janvier-2022

2.1.6 Chargin Stations

This data set is meant to determine the number of charging stations per canton, per charging power, per month, per plug type in Switzerland since November 2020. It is important to mention that a station can have multiple charging points. These values will help us to obtain more information about the Swiss adaption to EVs and their potential effects on EVs adoption by the Swiss population

Code
# Create a tibble with variable descriptions in English
variable_table_charging <- tibble(
  Variables = c("year",
                "month",
                "stations_CH_count",
                "stations_XY_count",
                "locations_CH_count",
                "locations_XY_count",
                "plugs_CH_count",
                "plugs_XY_count",
                "chargingPower_CH_sum",
                "chargingPower_XY_sum",
                "chargingPower_CH_count",
                "chargingPower_XY_count",
                "chargingPower_10kW_count",
                "chargingPower_21kW_count",
                "chargingPower_42kW_count"),
  Meaning = c(
    "Year of the key figures collection.",
    "Month of the key figures collection. These are collected daily and published monthly.",
    "Number of available charging stations in Switzerland. A station can have multiple charging points.",
    "Number of charging stations per canton. A station can have multiple charging points. This attribute is available for all cantons, XY representing the official canton abbreviation.",
    "Number of stations in Switzerland. A station can have multiple charging points.",
    "Number of stations per canton. A station can have multiple charging points. This attribute is available for all cantons, XY representing the official canton abbreviation.",
    "Number of charging plugs in Switzerland.",
    "Number of charging plugs per plug type in Switzerland. This attribute is available for all plug types, XY corresponding to the plug type according to the Open Intercharge Protocol (OICP) version 2.2 (see below).",
    "Total maximum charging power in kilowatts of recharge stations in Switzerland. Only charging points with known power are considered (see chargingPower_CH_count).",
    "Total maximum charging power in kilowatts of recharge stations in a canton. Only charging points with known power are considered (see chargingPower_XY_count).",
    "Number of recharge stations in Switzerland with known maximum charging power and for which the total maximum charging power has been taken into account (chargingPower_CH_sum).",
    "Number of recharge stations per canton with known maximum charging power and for which the total maximum charging power has been taken into account (chargingPower_XY_sum).",
    "Number of recharge stations in Switzerland with known maximum charging power delivering a maximum charging power of 10 kW.",
    "Number of recharge stations in Switzerland with known maximum charging power ranging from over 10 kW to 21 kW.",
    "Number of recharge stations in Switzerland with known maximum charging power ranging from over 21 kW to 42 kW."
  )
)

# Display the table using kableExtra
variable_table_charging %>%
  kbl() %>%
  kable_styling(position = "center")
Variables Meaning
year Year of the key figures collection.
month Month of the key figures collection. These are collected daily and published monthly.
stations_CH_count Number of available charging stations in Switzerland. A station can have multiple charging points.
stations_XY_count Number of charging stations per canton. A station can have multiple charging points. This attribute is available for all cantons, XY representing the official canton abbreviation.
locations_CH_count Number of stations in Switzerland. A station can have multiple charging points.
locations_XY_count Number of stations per canton. A station can have multiple charging points. This attribute is available for all cantons, XY representing the official canton abbreviation.
plugs_CH_count Number of charging plugs in Switzerland.
plugs_XY_count Number of charging plugs per plug type in Switzerland. This attribute is available for all plug types, XY corresponding to the plug type according to the Open Intercharge Protocol (OICP) version 2.2 (see below).
chargingPower_CH_sum Total maximum charging power in kilowatts of recharge stations in Switzerland. Only charging points with known power are considered (see chargingPower_CH_count).
chargingPower_XY_sum Total maximum charging power in kilowatts of recharge stations in a canton. Only charging points with known power are considered (see chargingPower_XY_count).
chargingPower_CH_count Number of recharge stations in Switzerland with known maximum charging power and for which the total maximum charging power has been taken into account (chargingPower_CH_sum).
chargingPower_XY_count Number of recharge stations per canton with known maximum charging power and for which the total maximum charging power has been taken into account (chargingPower_XY_sum).
chargingPower_10kW_count Number of recharge stations in Switzerland with known maximum charging power delivering a maximum charging power of 10 kW.
chargingPower_21kW_count Number of recharge stations in Switzerland with known maximum charging power ranging from over 10 kW to 21 kW.
chargingPower_42kW_count Number of recharge stations in Switzerland with known maximum charging power ranging from over 21 kW to 42 kW.
  • source https://www.bfe.admin.ch/bfe/fr/home/approvisionnement/statistiques-et-geodonnees/geoinformation/geodonnees/mobilite/bornes-de-recharge-pour-la-mobilite-electrique.html

2.2 Data Wrangling

2.2.1 Data Wrangling : Oil clean

Initially, while attempting to format the dataset, we discovered the generation of numerous NAs. Upon delving deeper into the data, we identified two distinct date formats: the first being [DD-MMM-YYYY] and the second [MMM DD, YYYY]. Consequently, the initial date formatting was ineffective. Therefore, we undertook the task of matching the two tables with the same date format.

Another issue we encountered was to present the date in a standardized form across all our dataset to perform time series analyses. For that, we had to change the abbreviated months names (i.e. Jan, Feb, etc.) into numbers.

Code
# Change date column into Day

# Two types of format: one format before 22.04.2020,a different format after

# 22.04.2020 is on row 8361
# Format 1: 15-Apr-2020
oil_df_1 <- oil_prices_data[0:8360,] |>
  separate(Date, into = c("Day", "Month", "Year"), sep = "-")

# Format 2 : Apr 22, 2020
oil_df_2 <- oil_prices_data[8361:nrow(oil_prices_data),] |>
  separate(Date, into = c("Month", "Day", "Year"), sep = " |, ")

# Check for NAs
print(which(rowSums(is.na(oil_df_1)) > 0))
#> named integer(0)
print(which(rowSums(is.na(oil_df_2)) > 0))
#> named integer(0)

# We don't need dates before 2005
oil_df_1 <- oil_df_1[4486:nrow(oil_df_1),]

# Years in same format for both
oil_df_1$Year <- paste0("20", oil_df_1$Year)

# Change the order of both dataframes
oil_df_1 <- oil_df_1 %>%
  select("Year", "Month", "Day", "Price")

oil_df_2 <- oil_df_2 %>%
  select("Year", "Month", "Day", "Price")

# Merging both dataframes together
oil_df <- rbind(oil_df_1, oil_df_2)

# We want all our data to have the same date format #YYYY-MM-DD

# First we create a df to change the current month format to numbers
months_numbers <- data.frame(
  current_month_format = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"),
  numbers = 1:12
)

# Then we update our oil_data
oil_df <- oil_df %>%
  left_join(months_numbers, by = c("Month" = "current_month_format")) %>%
  select(Year, numbers, Day, Price) %>%
  rename(Months = numbers)

# Regroup the date in 1 column
oil_df <- oil_df %>%
  mutate(Date = as.Date(paste(Year, Months, Day, sep = "-"))) %>%
  select(Date, Price)


# Displayed clean data
reactable(oil_df,
          sortable = TRUE, 
          searchable = TRUE)

2.2.2 Data Wrangling : Google trend clean

Comprehending Google Trends information involves understanding the contextualization of the provided figures. The given data is standardized which limits the range of information we can obtain from it.

The crucial element for effective narration lies in utilizing the normalized Trends data. This normalization indicates that when observing search interest trends over time, it is interpreted as a ratio relative to all searches conducted on all topics on Google during that specific period and location. Likewise, when assessing regional search interest for a specific topic, it is construed as the search interest for that particular topic in a given region relative to all searches on all topics on Google in that same place and time.

Source

To clean and process eight Google Trends datasets related to different search terms, we created a function process_dataset() that reads each CSV file, removes date indices, excludes the first row, checks for missing values, renames columns, converts types, and standardizes the date format. Then, we iterated through each dataset, merged them based on the “Date” column, summed values for each search term per date, normalized these values between 1 and 100.

Code
dataset_names <- c(
  "googletrends_auto-elettrica_2005-2022.csv",
  "googletrends_elektro-auto_2005-2022.csv",
  "googletrends_elektrofahrzeug_2005-2022.csv",
  "googletrends_elektromobil_2005-2022.csv",
  "googletrends_eletric-car_2005-2022.csv",
  "googletrends_EV_2005-2022.csv",
  "googletrends_vehicule-electrique_2005-2022.csv",
  "googletrends_voiture-electrique_2005-2022.csv"
)

# Function to process each dataset
process_dataset <- function(file_path) {
  # Read the dataset
  google_trends_data <- read.csv(file_path)

  # Remove the dates from the index
  google_trends_data <- google_trends_data |> rownames_to_column(var = "Date")

  # Remove the first row using slice
  google_trends_data <- slice(google_trends_data, -1)

  # Check for NA
  cat("NA count for", file_path, ": ", sum(is.na(google_trends_data$Date)), "\n")
  cat("NA count for SearchCount in", file_path, ": ", sum(is.na(google_trends_data$Catégorie...Toutes.catégories)), "\n")

  # Rename col
  colnames(google_trends_data)[2] <- "SearchCount"

  # Convert to numeric
  google_trends_data$SearchCount <- as.numeric(google_trends_data$SearchCount)

  # Convert the column to date with the desired format
  google_trends_data$Date <- as.Date(paste(google_trends_data$Date, "01", sep = "-"))

  return(google_trends_data)
}

# Process each dataset and store in a list
processed_datasets <- list()

for (dataset_name in dataset_names) {
  file_path <- file.path("../data", dataset_name)
  processed_dataset <- process_dataset(file_path)
  processed_datasets[[dataset_name]] <- processed_dataset
}
#> NA count for ../data/googletrends_auto-elettrica_2005-2022.csv :  0 
#> NA count for SearchCount in ../data/googletrends_auto-elettrica_2005-2022.csv :  0 
#> NA count for ../data/googletrends_elektro-auto_2005-2022.csv :  0 
#> NA count for SearchCount in ../data/googletrends_elektro-auto_2005-2022.csv :  0 
#> NA count for ../data/googletrends_elektrofahrzeug_2005-2022.csv :  0 
#> NA count for SearchCount in ../data/googletrends_elektrofahrzeug_2005-2022.csv :  0 
#> NA count for ../data/googletrends_elektromobil_2005-2022.csv :  0 
#> NA count for SearchCount in ../data/googletrends_elektromobil_2005-2022.csv :  0 
#> NA count for ../data/googletrends_eletric-car_2005-2022.csv :  0 
#> NA count for SearchCount in ../data/googletrends_eletric-car_2005-2022.csv :  0 
#> NA count for ../data/googletrends_EV_2005-2022.csv :  0 
#> NA count for SearchCount in ../data/googletrends_EV_2005-2022.csv :  0 
#> NA count for ../data/googletrends_vehicule-electrique_2005-2022.csv :  0 
#> NA count for SearchCount in ../data/googletrends_vehicule-electrique_2005-2022.csv :  0 
#> NA count for ../data/googletrends_voiture-electrique_2005-2022.csv :  0 
#> NA count for SearchCount in ../data/googletrends_voiture-electrique_2005-2022.csv :  0

# Merge datasets based on the "Date" column
merged_data <- reduce(processed_datasets, left_join, by = "Date")

# Sum the values for each search term for a given date
df_summed <- merged_data %>%
  rowwise() %>%
  mutate(Sum_SearchCount = sum(c_across(starts_with("SearchCount"))))

# Normalize the Sum_SearchCount values between 1 and 100
df_summed$SearchRatio <- df_summed$Sum_SearchCount / 8

# Print the result
df_gtrends <- df_summed[,-(2:10)]

# Displayed clean data
reactable(df_gtrends,
          sortable = TRUE, 
          searchable = TRUE)

2.2.3 Cleaning of demographic_data

The main issue with this data set was to transform it in a way that makes it useful for our analysis. We first had to remove all rows containing category summaries in the middle of the data.

We then formulated to following hypothesizes:

  • Male and Female present the same attitude towards electric vehicles
  • Swiss and Foreign citizen living in Switzerland present the same attitude towards electric vehicles
  • People who are not allowed to drive do not have any impact on electric vehicles use

We then decided to do the following cleaning:

  • Only kept the “Total” for the Male-Female category
  • Only kept the “Total” for the Swiss-Foreign category
  • Only kept people aged between 18 and 99. (People aged over 99 account for 0.02% of the population)

Then, instead of keeping a raw data set with all ages between 18 and 99, we decided to group them in widely accepted age segments, namely Generation Z, Millennials, Generation X and Baby boomers. This categorization will help us to analyse whether different generations have a different approach to their mobility or not.

Code
# We only keep the data from 2005-2022 (time frame of our analysis)

demographic_data <- demographic_data[-c(1:31212),]

# We do not differentiate Male-Female for the purpose of this analysis -> we only keep the "total" rows
filtered_demographic <- demographic_data %>%
  filter(grepl("total",demographic_data$Sex))

# We do not differentiate Swiss citizenship - Foreign citizenship for the purpose of this analysis -> we only keep the "total" rows
filtered_demographic <- filtered_demographic %>%
  filter(grepl("total", filtered_demographic$Citizenship..category.))

# People under 18 cannot drive -> We remove summary rows +  people under 18 + people 99 and above (anecdotal observations)
filtered_demographic <- filtered_demographic %>%
  filter(
    !grepl("total",filtered_demographic$Age),
    between(as.numeric(gsub(" years","",filtered_demographic$Age)), 18, 98)
  )
#> Warning: There was 1 warning in `filter()`.
#> i In argument: `between(...)`.
#> Caused by warning in `between()`:
#> ! NAs introduced by coercion

# We keep the interesting columns
filtered_demographic <- filtered_demographic %>%
  select(c("Year","Age","Population.on.1.January"))

## To perform meaningful analyses on the data, we group the ages by generations

# remove " years" from the age column
filtered_demographic$Age <- as.numeric(gsub(" years", "",filtered_demographic$Age))

# Create bins with the different generations:
filtered_demographic$Generation <- cut(filtered_demographic$Age,
                                       breaks = c(17, 26, 42, 58, Inf),
                                       labels = c("Generation Z", "Millenials","Generation X", "Baby Boomers"),
                                       include.lowest = TRUE)
# we format the table wider
filtered_demographic <- filtered_demographic %>%
  select(-"Age")
  
df_demographic <- filtered_demographic %>%
  pivot_wider(names_from = Generation, values_from = Population.on.1.January, values_fn = sum)


# Displayed clean data
reactable(df_demographic,
          sortable = TRUE, 
          searchable = TRUE)

2.2.4 Data Wrangling : Swiss vehicle clean

Redundancy

The dataset, representing new vehicle registrations in Switzerland from 2005 onwards, exhibited redundancy in its structure. For example, the data contained both main categories (like ‘> Passenger cars’) and their respective sub-categories (‘… Passenger car’ and ‘.. Heavy passenger car’). On examination, it was observed that the counts under the main categories were simply aggregates of the counts of their sub-categories. Such redundancy could lead to double counting in analytical processes.

We addressed this issue by removing main categories that were aggregates of their sub-categories, preventing potential double counting in analyses. For two subsets of data (2005 to 2008 and 2009 to 2022), We filtered out main categories, cleaned and standardized the column names, reshaped the data set from wide to long format to facilitate analysis, and extracted and formatted the year information for each observation. This process allowed for a more granular and consistent data set for further analysis and interpretation.

Code
##### 1st data 2005 to 2008 clean #####
# remove the main categories to avoid  and to work with a more granular dataset.
# Filter out main categories (rows starting with '>')
v_2005_2008 <- vehicle_data_2005_2008 |>
  filter(!grepl("^>", trimws(Vehicle.group...type)))
#rename the Vehicle.group...type column to VehicleType and clean the observations' names
v_2005_2008 <- v_2005_2008 |>
  rename(Location = Canton,
         VehicleType = Vehicle.group...type) |>
  mutate(VehicleType = str_remove(VehicleType, "^\\.\\.\\.\\s*"))


# Year Extraction, Reshape the dataset from wide to long format
v_2005_2008 <- v_2005_2008 |>
  pivot_longer(cols = `X2005`:`X2008`, names_to = "Year", values_to = "Count")
# Remove the "X" prefix from the Year column
v_2005_2008$Year <- gsub("X", "", v_2005_2008$Year)


##### 2nd data 2009 to 2022 clean #####
# Filter out main categories (rows starting with '>')
v_2009_2022 <- vehicle_data_2009_2022 |>
  filter(!grepl("^>", trimws(Vehicle.group...type)))
#rename the Vehicle.group...type column to VehicleType and clean the observations' names
v_2009_2022 <- v_2009_2022 |>
  rename(Location = Canton,
         VehicleType = Vehicle.group...type) |>
  mutate(VehicleType = str_remove(VehicleType, "^\\.\\.\\.\\s*"))

# Year Extraction, Reshape the dataset from wide to long format
v_2009_2022 <- v_2009_2022 |>
  pivot_longer(cols = `X2009`:`X2022`, names_to = "Year", values_to = "Count")
# Remove the "X" prefix from the Year column
v_2009_2022$Year <- gsub("X", "", v_2009_2022$Year)

Merging and Joining

As the vehicle data was split across multiple files, it was necessary to combine them for a holistic view.

We used bind_rowsto create df_v, then sorted the dataset by yearto ensure a structured chronological view of the data.

Code
df_v <- bind_rows(v_2005_2008, v_2009_2022)
df_v <- df_v |>
  arrange(df_v$Year)

Refinement and Vehicle Classification Insights

  • Some of the canton names have special characters that haven’t been encoded properly in the dataset, causing them to display as hexadecimal escape sequences. To address this, we used iconv to properly encode these characters from “latin1” to “UTF-8”. Additionally, when multiple names were provided for cantons (separated by slashes), we retained only the first name using sapply and str_split. We standardized the canton names using a predefined list (standard_names).
  • We created a unified date column by combining ‘Year’ and ‘Month’ information. First, we matched month names to their corresponding numeric values, then merged this information with the year, and formatted it as “YYYY-MM-01” to generate a standard date format.This transformation streamlines time series analyses and provides a more intuitive representation of the data’s temporal dimension.

We simplified the classification of hybrid vehicles by merging different categories of conventional and plug-in hybrids into two main groups: “Conventional hybrid” and “Plug-in hybrid”, respectively. This simplification aimed to streamline analysis by grouping similar types together.

The “Gas” fuel type was renamed from “Gas (monovalent and bivalent)” for clarity and consistency in the dataset.

Further context was provided on different hybrid types:

Conventional Hybrids: These vehicles, powered by petrol or diesel without plug-in capability, might exhibit different adoption trends due to their longer existence in the market.Indeed, they have been around for longer and may have a different adoption trend compared to newer technologies.

Plug-in Hybrids: With larger batteries and the ability to charge from an electric outlet, these vehicles enable all-electric operation for limited distances. The presence of charging infrastructure might impact their adoption differently, distinguishing them from conventional hybrids. Adoption might be reflected differently due to the need for charging infrastructure.

Code
#### canton ####
df_v$Location <- iconv(df_v$Location, from = "latin1", to = "UTF-8")

# keep only the first name when multiple names are provided for cantons (separated by slashes)
df_v$Location <- sapply(str_split(df_v$Location, " / "), `[`, 1)

# Standardizing the cantons names:
standard_names <- c("Switzerland" = "Switzerland", "Zürich" = "ZH", "Bern" = "BE", "Luzern" = "LU", "Uri" = "UR", "Schwyz" = "SZ",
                    "Obwalden" = "OW", "Nidwalden" = "NW","Glarus" = "GL", "Zug" = "ZG", "Fribourg" = "FR", "Solothurn" = "SO",
                    "Basel-Stadt" = "BS", "Basel-Landschaft" = "BL", "Schaffhausen" = "SH", "Appenzell Ausserrhoden" = "AR",
                    "Appenzell Innerrhoden" = "AI", "St. Gallen" = "SG", "Graubünden" = "GR", "Aargau" = "AG", "Thurgau" = "TG",
                    "Ticino" = "TI", "Vaud" = "VD", "Valais" = "VS", "Neuchâtel" = "NE", "Genève" = "GE", "Jura" = "JU",
                    "Confederation" = "Confederation")
df_v$Location <- standard_names[df_v$Location]

#### Year ####
#create date col
# Convert month names to month numbers
df_v$MonthNum <- match(df_v$Month, month.name)
# Combine Year, MonthNum, and "01" to create a date in the format "YYYY-MM-01"
df_v$Date <- as.Date(paste(df_v$Year, df_v$MonthNum, "01", sep = "-"), format = "%Y-%m-%d")
# Drop the Month, Year, and MonthNum columns
df_v <- df_v %>% select(-Month, -Year, -MonthNum)


#### renaming ####
#renaming the 4 different hybrid name to only two
df_v <- df_v |>
  mutate(Fuel = case_when(
    Fuel %in% c("Petrol-electricity: conventional hybrid", "Diesel-electricity: conventional hybrid") ~ "Conventional hybrid",
    Fuel %in% c("Petrol-electricity: plug-in hybrid", "Diesel-electricity: plug-in hybrid") ~ "Plug-in hybrid",
    TRUE ~ Fuel # Keeps all other fuel types as they are
  ))

#renaming fuel type: gas
df_v <- df_v |>
  mutate(Fuel = if_else(Fuel == "Gas (monovalent and bivalent)", "Gas", Fuel))
Code
#count nbr of vehicle for a paticular year and 
vehicle_count_2022 <- df_v |>
  filter(Location == "Switzerland", year(Date) == 2023, VehicleType == "Passenger car") |>
  summarize(TotalCount = sum(Count))

Rows with 0

Retaining rows with a Count of 0 ensures the completeness of our dataset, indicating periods when specific vehicle types or fuel categories had no registrations. This not only provides a holistic view of vehicle adoption trends over time but also aids in generating continuous time series visualizations without gaps, offering a true representation of the data.

Code
# Displayed clean data
#reactable(df_v, sortable = TRUE, searchable = TRUE)

2.2.5 Data wrangling French vehicle clean

This dataset will be used to compare our data in Switzerland with a bordering country (FR)

It is important to note that this data set contains all registered vehicles in a given year (vs. all new registered cars for our Swiss data) We will only keep the vehicles used for personal use (not the ones for professional use), to match our Swiss dataset

We decided to remove some anecdotal observations (i.e Hydrogen, Unknown, etc.) as their production / use is marginal. Moreover, we have decided to regroup the different kind of hybrid motors into two Hybrid categories. The 2 categories are the plugable hybrid vehicle HR (petrol or diesel) and the non-plugable hybrid vehicles HNR (petrol or diesel). This was done to match our Swiss data set.

The main issue with this data set was to set it in a way where it is comparable to our Swiss cars data set. Indeed, this dataset presents the total of all cars in France for a given year and type, while our Swiss data set accounts for new registration of cars for a given year and type.

To be able to compare two data sets, we decided to format the French data set in the following way:

  • Remove categories that are not relevant for our analysis, and widely under-represented in the population (i.e Gaz, hydrogen)
  • Focus on the private use of cars (vs. professional), because the decision to use a certain type of cars for professional reasons are often explained by factors hard to capture in our analysis (i.e bundle deals, greenwashing). We therefore hypothesized that private use of a certain type of car better gives insights on population view.
  • The match our swiss data set (new registration per year), we computed the difference of total cars registered in France per years. Having only the data between 2011 and 2022, we have “lost” the insights for 2011. Indeed, when creating the deltas (difference) columns, 2011 deltas could not be computed (no data for 2010), we therefore decided to delete that year from our data
Code

# the first three rows are empty
france_v <- france_v[- c(1,2,3),]

# Only taking the private cars from this data
france_private <- france_v[25:37,]


# pivoting to match the Swiss data set and correctly setting the column names
france_private <- t(france_private)
colnames(france_private) <- france_private[1,]
france_private <- france_private[-1,]

# Removing "Particulier" ( =aggregate of all categories), "Gaz" (<1% of cars), "Unknown" (muddies the data), "Hydrogen (anecdotal observations)
france_private <- as_tibble(france_private)

france_private <- france_private %>%
  select(-c("Particulier", "Gaz", "Gaz HNR", "Gaz HR", "Hydrogène et autre ZE", "Inconnu"))

# Add info about the year
france_private <- france_private %>%
  mutate(Year = 2011:2022)

# Removing the decimals due to previous manipulations
for (i in 1:(ncol(france_private) - 2)) {  # Exclude the last two columns (Year and Hybrid)
  france_private[12, i] <- as.character(floor(as.numeric(france_private[12, i])))
}

# To match swiss data set: creating the Conventional Hybrid and Plug-in Hybrid categories (aggregating the different kinds of hybrid) and removing the aggregated columns
france_private <- france_private %>%
  mutate(
    Conventional_Hybrid = as.numeric(`Diesel HNR`)+ as.numeric(`Essence HNR`),
    Plug_in_Hybrid = as.numeric(`Diesel HR`) + as.numeric(`Essence HR`)
  )

france_private <- france_private %>%
  select(Diesel, Essence, Conventional_Hybrid, Plug_in_Hybrid, Electrique, Year)

# To match our Swiss data set, we need to deltas (change) between each years
france_private <- france_private %>%
  mutate(across(c(Diesel, Essence, Conventional_Hybrid, Plug_in_Hybrid, Electrique), as.numeric),
         Diesel_delta = Diesel - lag(Diesel),
         Essence_delta = Essence - lag(Essence),
         Conventional_Hybrid_delta = Conventional_Hybrid - lag(Conventional_Hybrid),
         Plug_in_Hybrid_delta = Plug_in_Hybrid - lag(Plug_in_Hybrid),
         Electrique_delta = Electrique - lag(Electrique))

# Removing the NAs I've created with the lag function (2011 deltas were NA because we don't have data for 2010)
france_private <- france_private[-1,]

# Ordering the columns for more clarity
df_v_fr <- france_private %>%
  select(c("Year", "Diesel", "Diesel_delta", "Essence", "Essence_delta", "Conventional_Hybrid", "Conventional_Hybrid_delta", "Plug_in_Hybrid", "Plug_in_Hybrid_delta", "Electrique", "Electrique_delta"))


#convert date into datetime format
# Assuming 'Year' is an integer column representing the year
df_v_fr$Date <- as.Date(paste(df_v_fr$Year, "-01-01", sep = ""), format = "%Y-%m-%d")
# Rearrange columns with 'Date' as the first column and drop 'year' and 'month'
df_v_fr <- df_v_fr %>%
  select(Date, everything()) %>%
  select(-Year)

# Displayed clean data
reactable(df_v_fr,
          sortable = TRUE, 
          searchable = TRUE)

2.2.6 Availability of charging stations

This data set will be used to explore and analyze the effects of charging station availability on EVs adoption.

However, two important points are to note.

  1. We were unable to find data before November 2020
  2. Charging station availability and EV in market are likely strongly reciprocally correlated.

Moreover, it is important to keep standardized dates and locations

We merges ‘year’ and ‘month’ columns to create a ‘Date’ column in the format “YYYY-MM-01”, subsequently reorganizing the dataframe to position ‘Date’ as the first column and removing the original ‘year’ and ‘month’ columns. ::: {.cell layout-align=“center”}

Code
# Combine 'year' and 'month' columns to create a new 'Date' column
charging_station$Date <- as.Date(paste(charging_station$year, charging_station$month, "01", sep = "-"))

# Rearrange columns with 'Date' as the first column and drop 'year' and 'month'
df_charging_station <- charging_station %>%
  select(Date, everything()) %>%
  select(-year, -month)

# Displayed clean data
reactable(df_charging_station,
          sortable = TRUE, 
          searchable = TRUE)

:::

3 Exploratory data analysis

3.1 Switzerland

3.1.1 sesonality

Code
# Filter for only Passenger car vehicle type
passenger_cars <- df_v |>
  filter(VehicleType == "Passenger car") |>
  group_by(Date) %>%
  summarise(Count = sum(Count, na.rm = TRUE), .groups = 'drop')

# Plotting the data with ggplot2, using Date directly for Passenger cars
ggplot(passenger_cars, aes(x = Date, y = Count)) +
  geom_line() + # Use geom_bar(stat = "identity") if you prefer bar plots
  labs(title = "Passenger Car Adoption Over Time",
       x = "Date",
       y = "Number of Passenger Cars Registered") +
  theme_minimal() +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") + 
  geom_smooth(method = "loess", se = FALSE, color = "blue") +# Set date breaks and labels
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels by 45 degrees

The blue line represents a smoothed trend, indicating an initial increase in car registrations until around 2014-2015, followed by a gradual decline. The black line shows the actual number of cars registered, with significant variability. We will call this variation seasonality which is better represented in the following graph.

Code
# Filter for only Passenger car vehicle type
passenger_cars <- df_v %>%
  filter(VehicleType == "Passenger car") |>
  mutate(Year = year(Date), Month = month(Date)) |>
  group_by(Year, Month) %>%
  summarise(Count = sum(Count, na.rm = TRUE), .groups = 'drop') |>
  mutate(Month = factor(Month, levels = 1:12, labels = month.name))  # Convert Month to a factor to ensure proper ordering in the plot

# Plotting the data with ggplot2, showing the trend within each year
ggplot(passenger_cars, aes(x = Month, y = Count, group = Year, color = as.factor(Year))) +
  geom_line(stat = "smooth", se = FALSE, method = "loess", span = 0.5) +  # Smooth the existing lines
  labs(title = "Monthly Passenger Car Registrations by Year",
       x = "Month",
       y = "Number of Passenger Cars Registered") +
  theme_minimal() +
  scale_x_discrete(limits = month.name) +  # Ensure months are shown in order
  scale_color_viridis_d() +  # Use viridis color scale
  theme(legend.position = "bottom", axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels by 45 degrees)

This pattern suggests a seasonal trend with a mid-year peak and a year-end increase. 2020 reacts differently than other years. We suggest that it is probably related to Covid policies.

Code
# Filter for only Passenger car vehicle type
passenger_cars <- df_v |>
  filter(VehicleType == "Passenger car") |>
  mutate(YearMonth = floor_date(Date, "month")) |>
  group_by(YearMonth) %>%
  summarise(Count = sum(Count, na.rm = TRUE), .groups = 'drop') |>
  ungroup() %>%
  mutate(Year = year(YearMonth), 
         Month = month(YearMonth, label = TRUE, abbr = TRUE)) |>
  arrange(Year, Month)

# Plotting the data with ggplot2 without month names on the x-axis
ggplot(passenger_cars, aes(x = as.numeric(format(YearMonth, "%m")), y = Count, group = Year, color = as.factor(Year))) +
  geom_line() +
  facet_wrap(~ Year, scales = "free_y") +
  labs(title = "Seasonal Trends in Passenger Car Registrations",
       x = "Month",
       y = "Number of Passenger Cars Registered") +
  theme_minimal() +
  theme(axis.text.x = element_blank(), # This will remove the month labels
        axis.text.y = element_blank(), # This will remove the month labels
        axis.ticks.x = element_blank(), # This will remove the ticks on the x-axis
        legend.position = "none") # Remove the legend to clean up the plot

This version of the graphs helps us visualizing the evolution of registration for each year individually. As mentioned above, 2020 is the only year which show a decreasing peak instead of an increasing one as it is presented for all other years.

3.1.2 Vehicule Registration by Fuel time over time

Code
# Filter df_v for specific fuel types and vehicle type
filtered_df <- df_v %>%
  filter(Fuel %in% c("Petrol", "Diesel", "Conventional hybrid", "Plug-in hybrid", "Electricity") &
         VehicleType == "Passenger car")

# Group by Date and Fuel type, and summarize the count
fuel_type_trends <- filtered_df %>%
  group_by(Date, Fuel) %>%
  summarize(Count = sum(Count, na.rm = TRUE)) %>%
  ungroup()

# Plotting the trends over time by fuel type
ggplot(fuel_type_trends, aes(x = Date, y = Count, color = Fuel)) +
  geom_line(stat = "smooth", se = FALSE, method = "loess", span = 0.1, size = 1) +
  labs(title = "Vehicle Registrations by Fuel Type Over Time",
       x = "Date",
       y = "Number of Vehicles Registered") +
  theme_minimal() +
  theme(legend.position = "bottom")
#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
#> i Please use `linewidth` instead.

The graph above shows the vehicle registrations in Switzerland by fuel type from 2005 to 2023.

On the one hand, it highlights the recent decrease in new registrations of thermic vehicles since 2017. On the other hand, the three EVs named as Conventional and Plug-in hybrids along with electricity ones are gradually increasing since approximately the same period. (In fact, fully electric cars and Conventional hybrids have even reached a higher level of vehicle registration in the past years, spotlighting the EV tendency in recent years.)

3.1.3 Map

3.1.4.1 Count of Electricity car Registration for all years per cantons

swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET.shp contains the boundaries of the cantons of Switzerland. ::: {.cell layout-align=“center”}

Code
df_v_map <- df_v |>
  filter(!Location %in% c("Switzerland", "Confederation"))
# Mapping between canton abbreviations and numbers
# Define a vector of the abbreviation values in the desired order
abbreviation_values <- c("ZH", "BE", "LU", "UR", "SZ", "OW", "NW", "GL", "ZG", "FR", "SO", "BS", "BL", "SH", "AR", "AI", "SG", "GR", "AG", "TG", "TI", "VD", "VS", "NE", "GE", "JU")

# Use the match function to create the new column
df_v_map$KANTONSNUM <- match(df_v_map$Location, abbreviation_values)

swiss_cantons <- st_read("../data/CH_map/swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET.shp")
#> Reading layer `swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET' from data source `C:\Users\UrsHu\Pillars\Learn\Academic\Master\Semestre 1\Data Science\0_group_project\dsfba_project\data\CH_map\swissBOUNDARIES3D_1_4_TLM_KANTONSGEBIET.shp' 
#>   using driver `ESRI Shapefile'
#> Simple feature collection with 50 features and 20 fields
#> Geometry type: POLYGON
#> Dimension:     XYZ
#> Bounding box:  xmin: 2490000 ymin: 1080000 xmax: 2830000 ymax: 1300000
#> z_range:       zmin: 193 zmax: 4610
#> Projected CRS: CH1903+ / LV95 + LN02 height

# Aggregate the data
preprocessed_data <- df_v_map %>%
  group_by(Fuel, VehicleType, Year = format(as.Date(Date), "%Y"), KANTONSNUM) %>%
  summarize(Total = sum(Count), .groups = 'drop')

# Ensure KANTONSNUM is correctly formatted in both datasets
df_v_map$KANTONSNUM <- as.numeric(as.character(df_v_map$KANTONSNUM))
swiss_cantons$KANTONSNUM <- as.numeric(as.character(swiss_cantons$KANTONSNUM))

# Filter and aggregate df_v data
filtered_data <- df_v_map %>%
  filter(Fuel == "Electricity", VehicleType == "Passenger car") %>%
  group_by(KANTONSNUM) %>%
  summarize(Total = sum(Count))

# Merge with shapefile data
map_data <- merge(swiss_cantons, filtered_data, by = "KANTONSNUM")

# Create a ggplot
p <- ggplot() +
  geom_sf(data = map_data, aes(fill = Total), color = NA) +
  scale_fill_viridis_c() +
  theme_minimal()

# Convert to interactive plotly plot
ggplotly(p)

:::

3.2 Google Trend

Code
ggplot(df_gtrends, aes(x = Date, y = SearchRatio)) +
  geom_line(color = "darkgreen", size = 1, stat='smooth', se = FALSE, method = "loess", span = 0.1, size = 1) +
  labs(x = "Date", y = "Google Search", title = "Google search About EV in Switzerland")
#> Warning: Duplicated aesthetics after name standardisation: size

Another interesting point to look out is the Google search about EV engine in Switzerland. As we can see, it seems to have skyrocketed since 2016. This allows us to get an overview of the Swiss population interest concerning this topic and comfort us towards our initial predictions.

3.3 Oil

Code
ggplot(oil_df, aes(x = Date , y = Price)) +
  geom_line(color = "darkred", size = 1) +
  labs(x = "Date", y = "Price", title = "Oil Price Over Time")

The above chart represents the oil price evolution through the last two decades. We can observe a quite high degree of volatility concerning the oil valuation. Nevertheless, it seems important to highlight that this value has greatly increased since the past 3 years.

3.4 Demographics

Code
demographic_data <- df_demographic %>%
  mutate(Population = `Generation Z` + `Millenials` + `Generation X` + `Baby Boomers`)

# Plotting the data
ggplot(demographic_data, aes(x = Year)) +
  geom_line(aes(y = `Generation Z`, color = "Generation Z"), size = 1) +
  geom_line(aes(y = Millenials, color = "Millenials"), size = 1) +
  geom_line(aes(y = `Generation X`, color = "Generation X"), size = 1) +
  geom_line(aes(y = `Baby Boomers`, color = "Baby Boomers"), size = 1) +
  labs(title = "Demographic Trends in Switzerland",
       x = "Year",
       y = "Population",
       color = "Generation") +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_color_manual(values = c("Generation Z" = "blue", "Millenials" = "red", "Generation X" = "green", "Baby Boomers" = "purple"))

This graph concentrates on the demographic trends’ evolution for each segment of age abovementioned throughout the time. It points out the important overall rise of individuals living in Switzerland over the past two decades.

3.5 French vehicles

3.5.1 Total vehicules evolution France

Code
# Select relevant columns for plotting
registration_data <- df_v_fr[, c("Date", "Diesel", "Essence", "Conventional_Hybrid", "Plug_in_Hybrid", "Electrique")]

# Plotting log-scale evolution of car registrations
ggplot(registration_data, aes(x = Date)) +
  geom_line(aes(y = log(Diesel), color = "Diesel")) +
  geom_line(aes(y = log(Essence), color = "Essence")) +
  geom_line(aes(y = log(Conventional_Hybrid), color = "Conventional Hybrid")) +
  geom_line(aes(y = log(Plug_in_Hybrid), color = "Plug-in Hybrid")) +
  geom_line(aes(y = log(Electrique), color = "Electrique")) +
  scale_color_manual(values = c("Diesel" = "red", "Essence" = "blue", 
                                 "Conventional Hybrid" = "green", "Plug-in Hybrid" = "purple", 
                                 "Electrique" = "orange")) +
  labs(x = "Date", y = "Log-Scale Count", color = "Fuel Type") +
  theme_minimal()

3.5.2 Deltas evolution

Code
# Plotting for total cars
ggplot(df_v_fr, aes(x = Date, y = Diesel_delta, color = "Diesel")) +
  geom_line(size = 1) +
  geom_line(aes(y = Essence_delta, color = "Petrol"), size = 1) +
  geom_line(aes(y = Conventional_Hybrid_delta, color = "Conventional hybrid"), size = 1) +
  geom_line(aes(y = Plug_in_Hybrid_delta, color = "Plug-in hybrid"), size = 1) +
  geom_line(aes(y = Electrique_delta, color = "Electricity"), size = 1) +
  labs(title = "Evolution of cars registered in France over the years by fuel type (Deltas)",
       x = "Years",
       y = "Value",
       color = "Fuel Category") +
  theme_minimal() +
  scale_color_manual(values = c("Diesel" = "red", "Petrol" = "blue", 
                                 "Conventional hybrid" = "green", "Plug-in hybrid" = "purple",
                                 "Electricity" = "orange"))

3.6 Swiss vs France

3.6.1 Electric vs Hybrid vs Petrol

Here we compare and visualize specific vehicle fuel types between Swiss and French datasets.

The thicker line are the for Switzerland, the others are for France. We standardized the counts in both the Swiss and French datas ets for comparison purposes. Please juggle with the interactive plot to make it more readable. ::: {.cell layout-align=“center”}

Code
# Filtering Swiss data for specific fuel types
swiss_specific_fuel <- df_v %>%
  filter(Fuel %in% c("Diesel", "Electricity", "Conventional hybrid", "Plug-in hybrid", "Petrol")) %>%
  filter(Location == 'Switzerland') |>
  filter(VehicleType == 'Passenger car') |>
  filter(Date > as.Date('2012-01-01')) |>
  filter(Date < as.Date('2021-12-31'))

# Selecting equivalent columns from the French dataset
french_specific_fuel <- df_v_fr %>%
  select(Date, Diesel_delta, Essence_delta, Conventional_Hybrid_delta, Plug_in_Hybrid_delta, Electrique_delta) # Adjust column names accordingly

# Reshape French dataset to long format for easier plotting
french_specific_fuel_long <- french_specific_fuel %>%
  pivot_longer(cols = -Date, names_to = "Fuel", values_to = "Count")

# Standardize counts in each dataset
swiss_specific_fuel <- swiss_specific_fuel %>%
  mutate(Count = scale(Count))

french_specific_fuel_long <- french_specific_fuel_long %>%
  mutate(Count = scale(Count))

# Rename the 'Fuel' column in the French dataset
french_specific_fuel_long <- french_specific_fuel_long %>%
  mutate(Fuel = case_when(
    Fuel == "Diesel_delta" ~ "Diesel",
    Fuel == "Essence_delta" ~ "Petrol",
    Fuel == "Conventional_Hybrid_delta" ~ "Conventional hybrid",
    Fuel == "Plug_in_Hybrid_delta" ~ "Plug-in hybrid",
    Fuel == "Electrique_delta" ~ "Electricity"
  ))

# Define color palette for each fuel type
fuel_colors <- c("Diesel" = "black", "Electricity" = "green", "Conventional hybrid" = "purple", "Plug-in hybrid" = "blue", "Petrol" = "orange")

p <- ggplot() +
  geom_smooth(data = swiss_specific_fuel, aes(x = Date, y = Count, color = Fuel), 
              method = "loess", se = FALSE, size = 1.5) +
  geom_line(data = french_specific_fuel_long, aes(x = Date, y = Count, color = Fuel), 
            alpha = 0.4, size = 0.8) +
  scale_color_manual(values = fuel_colors, 
                     labels = c("Diesel", "Electricity", "Conventional hybrid", 
                                "Plug-in hybrid", "Petrol"),
                     breaks = c("Diesel", "Electricity", "Conventional hybrid", 
                                "Plug-in hybrid", "Petrol")) +
  labs(x = "Date", y = "Standardized Count", color = "Fuel Type") +
  theme_minimal() +
  geom_text(data = data.frame(x = as.Date("2021-01-01"), y = c(3, 2.8), 
                              label = c("Switzerland has", "the thickest line")), 
            aes(x = x, y = y, label = label, color = label), 
            size = 4, show.legend = FALSE)

# Convert to interactive plot
interactive_plot <- ggplotly(p, width = 600, height = 400)
interactive_plot

:::

3.8 EV and Oil Price

The results of the graphic comparing oil price and EVs evolution comforts us in our choice of oil price as an explicative variable. Indeed, the tendency is quite similar since 2020. In fact, the previous years refer to a period where EVs were not as commercialized as today. Obviously, we are aware of the numerous other variables explaining both oil price and EVs rise over time.

Code
df_oil_monthly <- oil_df %>% 
                  mutate(Date = as.Date(format(Date, "%Y-%m-01"))) %>%
                  group_by(Date) %>% 
                  summarize(Price = mean(Price))

df_electric_vehicles_monthly <- df_electric_vehicles_agg %>% 
                               mutate(Date = as.Date(format(Date, "%Y-%m-01"))) %>%
                               group_by(Date) %>% 
                               summarize(Count = sum(Count))

# Merge datasets
df_merged <- full_join(df_electric_vehicles_monthly, df_oil_monthly, by = "Date")

# Calculate the ratio for the secondary axis
max_count <- max(df_merged$Count, na.rm = TRUE)
max_price <- max(df_merged$Price, na.rm = TRUE)
ratio <- max_count / max_price

# Plotting with smoothing and color changes
ggplot(df_merged, aes(x = Date)) +
  geom_smooth(aes(y = Count, color = "Electric Vehicles Smoothed"), method = "loess", span = 0.2) +
  geom_line(aes(y = Price * ratio, color = "Oil Price")) +
  scale_y_continuous(
    "Number of Electric Vehicles",
    sec.axis = sec_axis(~ . / ratio, name = "Oil Price")
  ) +
  labs(title = "Comparison of Electric Vehicle Rise and Oil Prices Over Time",
       x = "Date", color = "Legend") +
  scale_color_manual(values = c("Electric Vehicles Smoothed" = "blue", "Oil Price" = "darkred")) +
  theme_minimal()
#> Warning: Removed 1 row containing missing values (`geom_line()`).

4 Analysis

TO DO’s

  • Answers to the research questions
  • Different methods considered
  • Competing approaches
  • Justifications

4.1 RQ1

4.2 RQ2

4.3 RQ3

4.4 RQç

4.5 RQ5

5 Conclusion

TO DO’s

  • Take home message
  • Limitations
  • Future work?